Wald (Inverse Gaussian) distribution (wald)#
The Wald distribution is a classic continuous model for positive, right-skewed waiting times. It is most famous as the first-passage time distribution of a Brownian motion with drift (a core story in sequential analysis and drift–diffusion models).
In SciPy,
scipy.stats.waldis the standardized Wald (a special case of the inverse Gaussian). In the literature, “Wald distribution” is often used interchangeably with the inverse Gaussian distribution.
1) Title & classification#
Item |
Value |
|---|---|
Name |
Wald / inverse Gaussian ( |
Type |
Continuous |
Support (standard) |
\(x \in (0,\infty)\) |
Parameter space (inverse Gaussian form) |
\((\mu,\lambda) \in (0,\infty)\times(0,\infty)\) |
SciPy |
standardized case \(\mu=1,\ \lambda=1\) with optional |
What you’ll be able to do after this notebook#
recognize when a Wald / inverse Gaussian model makes sense (especially first-passage times)
write the PDF/CDF and compute key moments
interpret how \((\mu,\lambda)\) change shape (and how SciPy parameterizes them)
derive mean/variance (via the MGF) and write down the log-likelihood
sample from an inverse Gaussian using NumPy only
use
scipy.stats.waldandscipy.stats.invgaussfor simulation and fitting
Notebook roadmap#
Title & classification
Intuition & motivation
Formal definition (PDF/CDF)
Moments & properties
Parameter interpretation
Derivations (\(\mathbb{E}[X]\), \(\mathrm{Var}(X)\), likelihood)
Sampling & simulation (NumPy-only)
Visualization (PDF, CDF, Monte Carlo)
SciPy integration (
scipy.stats.wald,scipy.stats.invgauss)Statistical use cases
Pitfalls
Summary
import math
import numpy as np
import scipy
from scipy import stats
from scipy.special import log_ndtr
import plotly
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
SEED = 123
rng = np.random.default_rng(SEED)
np.set_printoptions(precision=4, suppress=True)
print("numpy ", np.__version__)
print("scipy ", scipy.__version__)
print("plotly", plotly.__version__)
numpy 1.26.2
scipy 1.15.0
plotly 6.5.2
Prerequisites & notation#
Prerequisites
comfort with calculus (differentiation, basic integrals)
probability basics (PDF/CDF, expectation, likelihood)
optional: familiarity with Brownian motion / diffusion models
Notation (inverse Gaussian parameterization)
We use parameters \((\mu,\lambda)\) with
\(\mu>0\) = mean parameter
\(\lambda>0\) = shape (sometimes called “scale” in older sources)
and write
We also use \(\Phi(\cdot)\) for the standard normal CDF.
SciPy mapping
scipy.stats.waldis the standardized case (no shape parameters): $\(f(x)=\frac{1}{\sqrt{2\pi x^3}}\exp\!\left(-\frac{(x-1)^2}{2x}\right),\quad x>0.\)$For the general inverse Gaussian \(\mathrm{IG}(\mu,\lambda)\), SciPy recommends
scipy.stats.invgausswithinvgauss(mu=mu/lam, scale=lam, loc=0).
2) Intuition & motivation#
What it models#
The Wald / inverse Gaussian distribution is a model for a positive time to reach a threshold when progress is noisy but has a consistent drift.
A canonical story:
Let \(B_t\) be standard Brownian motion.
Consider a drift–diffusion process $\(X_t = \nu t + \sigma B_t,\)\( with drift \)\nu>0\( and diffusion scale \)\sigma>0$.
Define the first-passage time to level \(a>0\): $\(T = \inf\{t\ge 0 : X_t = a\}.\)$
Then \(T\) has an inverse Gaussian distribution:
So:
larger drift \(\nu\) makes the threshold reached sooner (smaller \(\mu\))
larger noise \(\sigma\) makes the time more variable (smaller \(\lambda\))
Typical real-world use cases#
Response times in cognitive models (drift–diffusion / sequential probability ratio tests)
Time-to-failure / fatigue-life models (some engineering contexts)
Queueing / transport: positive travel times with asymmetric right tails
Finance: first-hitting times of drifted diffusions
Relations to other distributions#
The Wald is a special case of the inverse Gaussian (
invgaussin SciPy).For large \(\lambda\) (small relative variance), \(\mathrm{IG}(\mu,\lambda)\) is approximately Normal: $\(X \approx \mathcal{N}\!\left(\mu,\ \frac{\mu^3}{\lambda}\right).\)$
The inverse Gaussian is used as a mixing distribution in normal variance–mean mixtures, yielding the normal-inverse-Gaussian family.
3) Formal definition#
PDF (inverse Gaussian form)#
For \(\mu>0\), \(\lambda>0\), the inverse Gaussian density is
Wald as a special case#
The standardized Wald distribution corresponds to \(\mu=1\), \(\lambda=1\):
CDF#
Let \(\Phi\) denote the standard normal CDF. For \(x>0\),
def invgauss_logpdf(x, mu, lam):
"""Log-PDF of IG(mu, lam) on x>0 (mu>0, lam>0)."""
x = np.asarray(x, dtype=float)
mu = float(mu)
lam = float(lam)
if mu <= 0 or lam <= 0:
return np.full_like(x, -np.inf, dtype=float)
out = np.full_like(x, -np.inf, dtype=float)
mask = x > 0
xm = x[mask]
out[mask] = (
0.5 * (np.log(lam) - np.log(2.0 * np.pi))
- 1.5 * np.log(xm)
- (lam * (xm - mu) ** 2) / (2.0 * mu**2 * xm)
)
return out
def invgauss_pdf(x, mu, lam):
return np.exp(invgauss_logpdf(x, mu, lam))
def invgauss_cdf(x, mu, lam):
"""CDF of IG(mu, lam) using a numerically-stable log-space form."""
x = np.asarray(x, dtype=float)
mu = float(mu)
lam = float(lam)
out = np.zeros_like(x, dtype=float)
if mu <= 0 or lam <= 0:
out[:] = np.nan
return out
mask = x > 0
xm = x[mask]
z1 = np.sqrt(lam / xm) * (xm / mu - 1.0)
z2 = -np.sqrt(lam / xm) * (xm / mu + 1.0)
# F = Phi(z1) + exp(2*lam/mu) * Phi(z2)
log_term1 = log_ndtr(z1)
log_term2 = (2.0 * lam / mu) + log_ndtr(z2)
out[mask] = np.exp(np.logaddexp(log_term1, log_term2))
return out
def wald_pdf(x):
return invgauss_pdf(x, mu=1.0, lam=1.0)
def wald_cdf(x):
return invgauss_cdf(x, mu=1.0, lam=1.0)
# Quick checks vs SciPy
xs = np.linspace(0.05, 6.0, 7)
print("wald pdf match:", np.allclose(wald_pdf(xs), stats.wald.pdf(xs)))
print("wald cdf match:", np.allclose(wald_cdf(xs), stats.wald.cdf(xs)))
# General IG(mu, lam) via scipy.stats.invgauss(mu/lam, scale=lam)
mu_test, lam_test = 1.7, 4.2
rv_ig = stats.invgauss(mu_test / lam_test, scale=lam_test)
print("IG pdf match:", np.allclose(invgauss_pdf(xs, mu_test, lam_test), rv_ig.pdf(xs)))
print("IG cdf match:", np.allclose(invgauss_cdf(xs, mu_test, lam_test), rv_ig.cdf(xs)))
wald pdf match: True
wald cdf match: True
IG pdf match: True
IG cdf match: True
4) Moments & properties#
For \(X\sim\mathrm{IG}(\mu,\lambda)\):
Quantity |
Value |
|---|---|
Mean |
\(\mathbb{E}[X] = \mu\) |
Variance |
\(\mathrm{Var}(X)=\dfrac{\mu^3}{\lambda}\) |
Skewness |
\(\gamma_1 = 3\sqrt{\dfrac{\mu}{\lambda}}\) |
(Excess) kurtosis |
\(\gamma_2 = 15\dfrac{\mu}{\lambda}\) (so kurtosis \(= 3+\gamma_2\)) |
A useful scale-free summary is the coefficient of variation:
MGF and characteristic function#
For \(t < \lambda/(2\mu^2)\), the MGF is
The characteristic function follows by substituting \(t\mapsto it\).
Entropy#
The differential entropy is
which does not simplify to a short elementary closed form. In practice, compute it numerically (SciPy provides .entropy()), or estimate it by Monte Carlo via \(-\mathbb{E}[\log f(X)]\).
mu, lam = 1.5, 3.0
# Theory
mean_th = mu
var_th = mu**3 / lam
skew_th = 3.0 * math.sqrt(mu / lam)
exkurt_th = 15.0 * (mu / lam)
print("theory mean,var,skew,exkurt:", (mean_th, var_th, skew_th, exkurt_th))
# SciPy check (invgauss(mu/lam, scale=lam) corresponds to IG(mu, lam))
rv = stats.invgauss(mu / lam, scale=lam)
mean_sp, var_sp, skew_sp, exkurt_sp = rv.stats(moments="mvsk")
print("scipy mean,var,skew,exkurt:", (float(mean_sp), float(var_sp), float(skew_sp), float(exkurt_sp)))
# Entropy: SciPy vs Monte Carlo estimate
h_scipy = float(rv.entropy())
samples = rv.rvs(size=80_000, random_state=rng)
h_mc = float(-np.mean(invgauss_logpdf(samples, mu, lam)))
print("entropy scipy:", h_scipy)
print("entropy MC :", h_mc)
theory mean,var,skew,exkurt: (1.5, 1.125, 2.121320343559643, 7.5)
scipy mean,var,skew,exkurt: (1.5, 1.125, 2.121320343559643, 7.5)
entropy scipy: 1.1683115761812808
entropy MC : 1.1673168314822207
5) Parameter interpretation#
Meaning of parameters#
\(\mu\) sets the typical scale: it is the mean and (often) close to the mode when the distribution is concentrated.
\(\lambda\) controls concentration around \(\mu\): $\(\mathrm{Var}(X)=\frac{\mu^3}{\lambda} \quad\Rightarrow\quad \lambda\uparrow \;\Longrightarrow\; \text{smaller variance and less skew.}\)$
A helpful way to think about shape is via the ratio \(\mu/\lambda\):
relative spread: \(\mathrm{CV}=\sqrt{\mu/\lambda}\)
skewness: \(3\sqrt{\mu/\lambda}\)
Shape changes#
Fix \(\mu\) and increase \(\lambda\): the density becomes sharply peaked near \(\mu\) and looks increasingly Normal.
Fix \(\lambda\) and increase \(\mu\): the mean shifts right and dispersion grows quickly (since variance scales like \(\mu^3\)).
# PDF: varying mu (keep lambda fixed)
lam_fixed = 3.0
mus = [0.6, 1.0, 1.8, 3.0]
x_max = max(stats.invgauss(mu / lam_fixed, scale=lam_fixed).ppf(0.995) for mu in mus)
x = np.linspace(1e-6, float(x_max), 900)
fig = go.Figure()
for mu_i in mus:
fig.add_trace(
go.Scatter(
x=x,
y=invgauss_pdf(x, mu_i, lam_fixed),
mode="lines",
name=f"μ={mu_i}, λ={lam_fixed}",
)
)
fig.update_layout(
title="Inverse Gaussian PDF: varying μ (λ fixed)",
xaxis_title="x",
yaxis_title="pdf",
)
fig.show()
# PDF: varying lambda (keep mu fixed)
mu_fixed = 1.2
lams = [0.5, 1.0, 3.0, 10.0]
x_max = max(stats.invgauss(mu_fixed / lam, scale=lam).ppf(0.995) for lam in lams)
x = np.linspace(1e-6, float(x_max), 900)
fig = go.Figure()
for lam_i in lams:
fig.add_trace(
go.Scatter(
x=x,
y=invgauss_pdf(x, mu_fixed, lam_i),
mode="lines",
name=f"μ={mu_fixed}, λ={lam_i}",
)
)
fig.update_layout(
title="Inverse Gaussian PDF: varying λ (μ fixed)",
xaxis_title="x",
yaxis_title="pdf",
)
fig.show()
6) Derivations#
Expectation and variance (via cumulant generating function)#
Let \(K(t)=\log M_X(t)\) where
Cumulants are derivatives at \(t=0\):
\(K'(0)=\mathbb{E}[X]\)
\(K''(0)=\mathrm{Var}(X)\)
Differentiate:
Differentiate again:
Likelihood#
For i.i.d. data \(x_1,\dots,x_n\) from \(\mathrm{IG}(\mu,\lambda)\), the log-likelihood is
with
A well-known MLE result (for the inverse Gaussian form) is:
def invgauss_loglik(x, mu, lam):
x = np.asarray(x, dtype=float)
return float(np.sum(invgauss_logpdf(x, mu, lam)))
def invgauss_mle(x):
x = np.asarray(x, dtype=float)
if np.any(x <= 0):
raise ValueError("All observations must be > 0 for IG(mu, lam).")
n = x.size
mu_hat = float(np.mean(x))
denom = float(np.sum((x - mu_hat) ** 2 / (mu_hat**2 * x)))
lam_hat = float(n / denom)
return mu_hat, lam_hat
def invgauss_mle_lam_given_mu(x, mu_fixed):
x = np.asarray(x, dtype=float)
mu_fixed = float(mu_fixed)
if mu_fixed <= 0:
raise ValueError("mu_fixed must be > 0")
if np.any(x <= 0):
raise ValueError("All observations must be > 0")
n = x.size
return float(n * mu_fixed**2 / np.sum((x - mu_fixed) ** 2 / x))
# Demonstration on synthetic data
mu_true, lam_true = 1.7, 4.0
rv_true = stats.invgauss(mu_true / lam_true, scale=lam_true)
x = rv_true.rvs(size=3000, random_state=rng)
mu_hat, lam_hat = invgauss_mle(x)
print("true (mu, lam):", (mu_true, lam_true))
print("mle (mu, lam):", (mu_hat, lam_hat))
print("loglik at true:", invgauss_loglik(x, mu_true, lam_true))
print("loglik at mle :", invgauss_loglik(x, mu_hat, lam_hat))
true (mu, lam): (1.7, 4.0)
mle (mu, lam): (1.6979415068846022, 3.9342328565348255)
loglik at true: -3765.005001936604
loglik at mle : -3764.79254307707
7) Sampling & simulation#
NumPy-only sampler (Michael–Schucany–Haas)#
A standard exact algorithm to sample \(X\sim\mathrm{IG}(\mu,\lambda)\) uses only Normal and Uniform random variables:
Draw \(V\sim\mathcal{N}(0,1)\) and set \(Y=V^2\).
Form a candidate $\(X_1 = \mu + \frac{\mu^2 Y}{2\lambda} - \frac{\mu}{2\lambda}\sqrt{4\mu\lambda Y + \mu^2 Y^2}.\)$
Draw \(U\sim\mathrm{Unif}(0,1)\).
Return \(X=X_1\) if \(U\le \frac{\mu}{\mu+X_1}\), else return \(X=\frac{\mu^2}{X_1}\).
This produces exact inverse Gaussian samples and is fast and vectorizable.
def invgauss_rvs_numpy(mu, lam, size=1, rng=None):
"""Sample IG(mu, lam) using NumPy only (Michael–Schucany–Haas)."""
if rng is None:
rng = np.random.default_rng()
mu = float(mu)
lam = float(lam)
if mu <= 0 or lam <= 0:
raise ValueError("mu and lam must be > 0")
v = rng.standard_normal(size)
y = v * v
mu2 = mu * mu
x1 = mu + (mu2 * y) / (2.0 * lam) - (mu / (2.0 * lam)) * np.sqrt(4.0 * mu * lam * y + mu2 * y * y)
u = rng.random(size)
x = np.where(u <= (mu / (mu + x1)), x1, mu2 / x1)
return x
# Quick validation of the sampler
mu, lam = 1.5, 3.0
n = 80_000
s = invgauss_rvs_numpy(mu, lam, size=n, rng=rng)
print("sample mean/var:", (float(np.mean(s)), float(np.var(s, ddof=0))))
print("theory mean/var:", (mu, mu**3 / lam))
sample mean/var: (1.4978110241273188, 1.1084760209581126)
theory mean/var: (1.5, 1.125)
8) Visualization#
We’ll visualize:
the PDF vs a histogram of Monte Carlo samples
the CDF vs the empirical CDF from samples
mu, lam = 1.3, 2.5
rv = stats.invgauss(mu / lam, scale=lam)
samples = invgauss_rvs_numpy(mu, lam, size=60_000, rng=rng)
x_max = float(rv.ppf(0.995))
x = np.linspace(1e-6, x_max, 900)
# PDF + histogram
fig = px.histogram(
x=samples,
nbins=120,
histnorm="probability density",
title="IG(μ, λ): histogram vs theoretical PDF",
labels={"x": "x"},
)
fig.add_trace(go.Scatter(x=x, y=invgauss_pdf(x, mu, lam), mode="lines", name="theory pdf"))
fig.update_layout(bargap=0.02)
fig.show()
# CDF + empirical CDF
xs = np.sort(samples)
ecdf = np.arange(1, xs.size + 1) / xs.size
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=invgauss_cdf(x, mu, lam), mode="lines", name="theory cdf"))
fig.add_trace(go.Scatter(x=xs[::50], y=ecdf[::50], mode="markers", name="empirical cdf", opacity=0.6))
fig.update_layout(
title="IG(μ, λ): empirical CDF vs theoretical CDF",
xaxis_title="x",
yaxis_title="cdf",
)
fig.show()
9) SciPy integration#
scipy.stats.wald#
standardized Wald distribution (no shape parameters)
supports generic
locandscaletransforms
scipy.stats.invgauss#
For the full inverse Gaussian \(\mathrm{IG}(\mu,\lambda)\), use:
rv = scipy.stats.invgauss(mu / lam, scale=lam, loc=0)
This matches the \((\mu,\lambda)\) parameterization used throughout this notebook.
# wald: pdf/cdf/rvs
x = np.linspace(1e-4, 6, 600)
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=stats.wald.pdf(x), mode="lines", name="wald pdf"))
fig.update_layout(title="scipy.stats.wald PDF (standardized)", xaxis_title="x", yaxis_title="pdf")
fig.show()
s_wald = stats.wald.rvs(size=10_000, random_state=rng)
print("wald sample mean/var:", (float(np.mean(s_wald)), float(np.var(s_wald))))
# wald.fit fits loc/scale (note: loc/scale are generic, not part of the classic (mu, lam) parameterization)
loc_true, scale_true = 0.2, 1.5
data = stats.wald.rvs(loc=loc_true, scale=scale_true, size=5000, random_state=rng)
loc_hat, scale_hat = stats.wald.fit(data)
print("wald true loc,scale:", (loc_true, scale_true))
print("wald fit loc,scale:", (float(loc_hat), float(scale_hat)))
# Inverse Gaussian (mu, lam) via invgauss
mu_true, lam_true = 1.7, 4.0
rv_ig = stats.invgauss(mu_true / lam_true, scale=lam_true)
data = rv_ig.rvs(size=5000, random_state=rng)
# Fit invgauss; fix loc=0 to match the (mu, lam) form.
mu_shape_hat, loc_hat, scale_hat = stats.invgauss.fit(data, floc=0)
mu_hat = float(mu_shape_hat * scale_hat)
lam_hat = float(scale_hat)
print("IG true (mu, lam):", (mu_true, lam_true))
print("IG fit (mu, lam):", (mu_hat, lam_hat))
wald sample mean/var: (0.9850123489974157, 0.9642769362574072)
wald true loc,scale: (0.2, 1.5)
wald fit loc,scale: (0.19128589639592256, 1.5316481216288031)
IG true (mu, lam): (1.7, 4.0)
IG fit (mu, lam): (1.707057237926007, 4.055512548439454)
10) Statistical use cases#
A) Hypothesis testing (likelihood ratio test for \(\mu\))#
A common inferential question in first-passage-time models is whether the mean time \(\mu\) equals a value implied by a hypothesized drift.
We can test $\(H_0: \mu = \mu_0 \quad\text{vs}\quad H_1: \mu \ne \mu_0\)\( using the likelihood ratio statistic \)\(\Lambda = 2\left(\ell(\hat\mu,\hat\lambda) - \ell(\mu_0, \hat\lambda_{\mu_0})\right),\)\( which is approximately \)\chi^2_1\( for large \)n$.
B) Bayesian modeling#
Inverse Gaussian likelihoods are common for positive time data. Even without closed-form conjugacy, you can do simple grid-based Bayes or use modern samplers (PyMC/Stan) for hierarchical models.
C) Generative modeling#
The inverse Gaussian is a standard mixing distribution: if \(V\sim\mathrm{IG}\) and $\(Y\mid V\sim\mathcal{N}(\beta V,\ V),\)\( then \)Y$ has heavier tails than a Gaussian (this leads to the normal-inverse-Gaussian family).
# A) LRT example for mu
mu_true, lam_true = 1.4, 3.5
rv = stats.invgauss(mu_true / lam_true, scale=lam_true)
x = rv.rvs(size=3000, random_state=rng)
mu0 = 1.2
mu_hat, lam_hat = invgauss_mle(x)
lam_hat_mu0 = invgauss_mle_lam_given_mu(x, mu0)
ll_alt = invgauss_loglik(x, mu_hat, lam_hat)
ll_null = invgauss_loglik(x, mu0, lam_hat_mu0)
lrt = 2.0 * (ll_alt - ll_null)
p_value = float(stats.chi2.sf(lrt, df=1))
print("true (mu, lam):", (mu_true, lam_true))
print("H0 mu0:", mu0)
print("MLE (mu, lam):", (mu_hat, lam_hat))
print("LRT statistic:", float(lrt))
print("approx p-value:", p_value)
true (mu, lam): (1.4, 3.5)
H0 mu0: 1.2
MLE (mu, lam): (1.381122312541543, 3.4295590938454583)
LRT statistic: 165.08395022407058
approx p-value: 8.769428012935727e-38
# B) Simple grid Bayesian posterior for mu (treat lambda as known)
mu_true, lam_known = 1.6, 4.0
x = stats.invgauss(mu_true / lam_known, scale=lam_known).rvs(size=400, random_state=rng)
# Prior: log-normal on mu (mean roughly around 1.5)
prior = stats.lognorm(s=0.35, scale=np.exp(np.log(1.5)))
mu_grid = np.linspace(0.4, 3.2, 800)
log_prior = prior.logpdf(mu_grid)
log_like = np.array([invgauss_loglik(x, mu, lam_known) for mu in mu_grid])
log_post_unnorm = log_prior + log_like
# Normalize on the grid (log-sum-exp)
log_post = log_post_unnorm - scipy.special.logsumexp(log_post_unnorm)
post = np.exp(log_post)
post_mean = float(np.sum(mu_grid * post))
cdf_post = np.cumsum(post)
cdf_post /= cdf_post[-1]
ci_low = float(mu_grid[np.searchsorted(cdf_post, 0.025)])
ci_high = float(mu_grid[np.searchsorted(cdf_post, 0.975)])
print("true mu:", mu_true)
print("posterior mean:", post_mean)
print("95% credible interval:", (ci_low, ci_high))
fig = go.Figure()
fig.add_trace(go.Scatter(x=mu_grid, y=post, mode="lines", name="posterior"))
fig.add_vline(x=mu_true, line_dash="dash", line_color="black")
fig.update_layout(title="Posterior for μ (λ known)", xaxis_title="μ", yaxis_title="posterior density (grid)")
fig.show()
true mu: 1.6
posterior mean: 1.6233797987498713
95% credible interval: (1.5284105131414267, 1.7281602002503131)
# C) Generative modeling: normal mean–variance mixture with IG mixing
n = 80_000
mu_v, lam_v = 1.0, 1.5
beta = 0.6
v = invgauss_rvs_numpy(mu_v, lam_v, size=n, rng=rng)
z = rng.standard_normal(n)
# Y | V=v ~ Normal(beta*v, v)
y = beta * v + np.sqrt(v) * z
mean_y = float(np.mean(y))
std_y = float(np.std(y))
print("mixture sample mean/std:", (mean_y, std_y))
# Compare to a Gaussian with the same mean/std
x = np.linspace(mean_y - 5 * std_y, mean_y + 5 * std_y, 800)
fig = px.histogram(y, nbins=160, histnorm="probability density", title="IG mixing yields heavier tails than Gaussian")
fig.add_trace(go.Scatter(x=x, y=stats.norm.pdf(x, loc=mean_y, scale=std_y), mode="lines", name="matched Normal"))
fig.update_layout(bargap=0.02)
fig.show()
mixture sample mean/std: (0.6083134294865697, 1.1183377846342366)
11) Pitfalls#
Parameterization confusion
The literature often uses \((\mu,\lambda)\) (mean + shape).
SciPy uses
waldfor a standardized special case, andinvgauss(mu, scale)for the general case.To represent \(\mathrm{IG}(\mu,\lambda)\) in SciPy, use
invgauss(mu/lam, scale=lam, loc=0).
Invalid parameters / support
Require \(\mu>0\), \(\lambda>0\), and data \(x_i>0\).
Using
locshifts the support to \((\mathrm{loc},\infty)\); fittinglocfreely can yield surprising results.
Numerical issues
For small \(x\) or large \(\lambda/\mu\), PDF/CDF computations can underflow/overflow.
Prefer
logpdf, and for the CDF use stable forms (e.g. vialog_ndtras above).
Goodness-of-fit p-values after fitting
If you estimate parameters from the same data, naive KS-test p-values are not calibrated (use bootstrap if you need a valid GOF p-value).
12) Summary#
The Wald / inverse Gaussian is a continuous distribution on \((0,\infty)\) with mean \(\mu\) and shape \(\lambda\).
It naturally models first-passage times of drifted Brownian motion: \(\mu=a/\nu\) and \(\lambda=a^2/\sigma^2\).
Key moments: \(\mathbb{E}[X]=\mu\), \(\mathrm{Var}(X)=\mu^3/\lambda\), skewness \(=3\sqrt{\mu/\lambda}\).
scipy.stats.waldis a standardized special case; for general \((\mu,\lambda)\) usescipy.stats.invgauss(mu/lam, scale=lam).Exact sampling is easy with a fast NumPy-only algorithm (Michael–Schucany–Haas).